Skip to content

Demonstrate lazy scheduling#754

Merged
frankmcsherry merged 4 commits intoTimelyDataflow:masterfrom
frankmcsherry:lazy_scheduling
Mar 13, 2026
Merged

Demonstrate lazy scheduling#754
frankmcsherry merged 4 commits intoTimelyDataflow:masterfrom
frankmcsherry:lazy_scheduling

Conversation

@frankmcsherry
Copy link
Member

This PR demonstrates how one could schedule subgraphs less often. It raises some questions about invariants, and we shouldn't race to merge it, but it seems worth talking about.

The idea is that subgraphs could set their notify_me() based on the values of their children, and if none of them require frontier notification then arguably the subgraph doesn't need it either. This is apparently a bit wrong, in that demand-driven progress communication is one reason that subgraphs do need frontier notification: if they are sitting on un-sent progress updates that would only be unlocked by frontier movement, they need to observe it to make progress. This happens with acknowledging messages sent across an IngressNub into the subgraph: they are exchanged as progress information to get the information to all workers, and until this happens they present as capabilities.

This is .. a quirk that may not be critical. We went back and forth a lot on whether the exchange should happen inside the subgraph, or outside the subgraph. I'd love to avoid flipping this without great care. This PR uses instead the override that lazy progress exchange should be triggered by pending child 0 updates. Spammier messages, but potentially worth the trade-off, eventually.

Where this is meant to be headed is towards a scheduling option that communicates "don't schedule me until the frontier reaches, or passes, my held capabilities". That would (I think) bypass this problem, and comport to most operators that want notification. We would extend notify_me() to return not a bool but an enum, but it should (other issues notwithstanding) unlock the ability to use regions to avoid progress work for operators that do not have anything to do.

@frankmcsherry frankmcsherry marked this pull request as ready for review March 13, 2026 09:54
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, and the description checks out. I feel this is relatively low-risk as it's opt-in, and we'd learn about it being wrong with a bit of testing I presume.

@frankmcsherry
Copy link
Member Author

The low risk thing is my hope. It seems hard to opt in incorrectly, as you need to actively track down the FrontierInterest enum at the moment.

@frankmcsherry frankmcsherry merged commit 09d0b87 into TimelyDataflow:master Mar 13, 2026
8 checks passed
@frankmcsherry frankmcsherry deleted the lazy_scheduling branch March 13, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants